NetHavoc v/s Harness CE

Introduction

Cavisson Systems’ NetHavoc is a unique implementation of core and advanced chaos engineering concepts that span the entirety of infrastructure, network and application layers. Conducting chaos experiments across these layers allows DevOps, SRE, QE and Development teams to accurately assess the system’s and application’s resilience. With an extensive integration with notification and ITSM tools, NetHavoc provides a ready enterprise level implementation of a highly matured chaos engineering platform.  

NetHavoc vs Harness CE

NetHavoc offers a wide variety of chaos experiments across infrastructure, network and application levels. In-built integration with Cavisson’s cutting edge performance testing and observability solution makes it as simple as a matter of clicks to analyze the impact of chaos experiments with production level load on end user experience by capturing actual user sessions

and viewing the performance of your application across individual transaction(s) /page(s)

/session(s).  

NetHavoc provides chaos experiments or havocs across a wider application landscape than Harness’s current support, which enables development teams to identify areas of improvement from a code level perspective in an increasingly diverse and evolving development framework ecosystem.

Detailed Comparison

The following section provides an in-depth comparison between NetHavoc and Harness CE. The differentiators are highlighted to easily identify features/components that set NetHavoc apart from Harness CE.

Supported Deployment Method(s)

NetHavoc

Harness CE

SaaS

Yes

Yes

On-Premise/Self-Managed

Yes

Yes

Air-Gapped

Yes

Yes

Supported Chaos Experiments 

NetHavoc

Harness CE

CPU Burst
Consumes CPU Cores or CPU utilization %

Yes

Yes

Disk Swindle
Fills Up disks on the server

Yes

Yes

I/O Shoot Up 
Increase I/O activity on the devices

Yes

Yes

Memory Outlay
Increases RAM utilization

Yes

Yes

Abort Application
Aborts Application by Process Name or Process ID

Yes

Yes

Terminate Cloud Instance
Kill instances of cloud machines

Yes

Yes

Kill Server
Shut down or reboot the machine

Yes

Yes

Teleport
Change system time either past or future change system time past or future

Yes

Yes

Kafka & JMS Distortion
Impact Kafka Topics or JMS Queues with Message Influx

Yes

No

Intrude Network
Packet corruption over interface

Yes

Yes

Trim Network Packets
Induces packet loss over interface

Yes

Yes

Dormant Network
Induces delay in network traffic

Yes

Yes

DNS Breakdown
Rejects calls to DNS Server

Yes

Yes

Alter Inbound Services
Induce delays & failures in service transactions.

Yes*

No(1)

Alter Outbound Services
Induce delays & failures in callout to outbound services.

Yes*

No(2)

Method Invocation
Delay method execution time.

Yes

No

* Microservice design patterns like Circuit Breaker and Bulkhead can be tested with these features.

[1]: Harness does not directly induce delay, instead, it uses an intermediate proxy server. Further, the delay is induced at the node/pod level. Delay on individual business transactions/services is not supported.

On the contrary, NetHavoc does not need a proxy, it injects delay directly inside the JVM. Also, the delay can be injected at a more granular level i.e. on individual business transactions/services.

[2]: Same as 1 above

Method Exception 

Generate exceptions in application methods.

Yes

No

Heap Memory Leak
Increase JVM heap utilization.

Yes

No

Application CPU Burst
Cause spike in CPU utilization via application process
 

Yes

Partially Supported in VMWare & Kubernetes

Thread Leak
Create threads to analyze applications’ processing

Yes

No

Application Kill

Terminate running application with customized error(s).

Yes

Yes

Additional Chaos Engineering Capabilities

NetHavoc

Harness CE

Monitor complete application/system for resiliency readiness

Yes

Partially Supported via Limited Probes

GameDays/Chaos Scenario Management

Yes

Yes

Abort chaos experiments

Yes

Yes

Schedule chaos experiments

Yes

Yes

Visual Experiment Builder

Yes

Yes

Execute chaos experiments in parallel

Yes

Yes

Integration with CI/CD Tools via REST API

Yes

Yes

Conduct chaos experiments with production level load

Yes

No

Native observability spanning logs & user sessions

Yes

No

Analyze webpage/transaction performance during & after experiments

Yes

No

Runtime Monitoring

NetHavoc

Harness CE

Built-in AIOps engine to assist in root cause analysis

Yes

No

Perform auto-remediation at infrastructure/app level

Yes

No

Analyze impact on end user experience

Yes

No

Perform diagnostic activities (Thread/Heap/TCP Dump)

Yes

No

Advance alerting algorithm to detect outliers, change etc..

Yes

No

Extensive log monitoring capabilities

Yes

No

Native Health Metrics

Yes

Only for Kubernetes

Create custom metrics

Yes

No

Single click comparison with multiple metrics

Yes

No

Out of the box Relational DB monitoring

(Oracle, MySQL, MSSQL, PostgreSQL, etc.)

Yes

No

Out of the box NoSQL DB Monitoring
(Cassandra, Redis, MongoDB, TSDB, Couchbase, Hadoop)

Yes

No

Analysis & Reports

NetHavoc

Harness CE

Customized reporting templates & scheduling options

Yes

No

In-built drill-down reports to analyze infra & app level impact

Yes

No

Integration with ITSM tools (ServiceNow, BMC Remedy)

Yes

No

Integration with wide array of communication tools (Slack, Teams, Spark, BigPanda).

Yes

Partially Supported

Administration, Security & Governance

NetHavoc

Harness CE

Comprehensive APIs

Yes

Yes

Built-in user management and authentication

Yes

Yes

Single Sign-On (LDAP, Okta)

Yes

Yes

Role-based Access Control

Yes

Yes

Full Audit Trails

Yes

Yes

Support

NetHavoc

Harness CE

SLA Guarantee

Yes

Yes

Training & Support

Yes

Yes

Online Community

Yes

Yes

Unified Experience Management Platform

Yes

No

Platform Specific Chaos Experiments Coverage

Cavisson’s NetHavoc provides extensive chaos experiment capabilities spanning application and infrastructure levels and with its support over multiple on-premise, cloud and containerized platforms, it offers a clear distinction over Harness. Let’s assess the chaos experiments supported over these aforementioned platforms:

GCP

Harness has extremely limited chaos experiment capabilities for GCP with disk loss and instance stop as the only chaos experiments supported. On the other hand, NetHavoc provides extensive infrastructure and application level chaos experiments or havocs on applications running on GCP.
Figure: 1

Azure

Harness provides infrastructure level and a single application level chaos experiments on Azure. For applications, the only chaos experiment supported is to restrict access to an application instance. NetHavoc provides a wider variety of chaos experiments across the application level to accurately determine an application’s resilience in production level scenarios.

Figure: 2

AWS

Harness provides a wider level of infrastructure and AWS service related chaos experiments but ends up lacking at the application level. Furthermore, Harness does not provide native observability for AWS. NetHavoc, apart from providing multiple application level chaos experiments, enables organizations with detailed, in-built monitoring of various AWS services which facilitates a 360-degree view of the distributed application ecosystem to better understand the extent of an experiment’s impact on both the application and infrastructure resiliency. Moreover, additional infrastructure/service level AWS chaos experiments are planned in the upcoming quarters as part of the product roadmap with the aim to provide an unmatched coverage for AWS via NetHavoc.
Figure: 3

Kubernetes

Harness provides a larger number of chaos experiments for Kubernetes as compared to NetHavoc, but, at the infrastructure level. NetHavoc provides a more granular approach at the application level where users can inject havoc(s) at individual transaction/service. As with AWS, a wider range of observability metrics for Kubernetes is provided in NetHavoc when compared to Harness. Having this level of detailed insight is essential to understanding the impact of your resiliency testing initiatives on the application and its underlying components in a micro-service oriented application landscape. Container, node, pod and control plane level metrics are all covered under Cavisson’s native observability, thus giving organizations a comprehensive insight into each component’s preparedness during outages.
Figure: 4

Pivotal Cloud Foundry/Tanzu Application Service

Harness has a single chaos experiment available for PCF/TAS whereas NetHavoc provides both system and application level chaos experiments along with in-built monitoring capabilities for applications deployed on Cloud Foundry. The monitoring module covers numerous integral cloud foundry services like Auctioneer, Nozzle, GoRouter, Controller, File Server amongst others to provide a holistic, all-round view of how your system & application responds to chaos experiments.
Figure: 5

Linux

As observed with different platforms, Harness provides chaos faults only at the system level in Linux whereas NetHavoc’s chaos experiment capabilities covering both the application and infrastructure layers along with supporting experiments for Kafka and JMS based MQs.
Figure: 6

Windows

NetHavoc provides resource and application level chaos experiments/havoc(s) for Windows in both VM and On-Premise format. Harness, on the other hand, does not support any chaos experiment for on premise windows OS based machines, and has resource level chaos experiments that are limited to Windows OS based VMWare VMs.

Due to this constraint, organizations with on premise Windows servers cannot utilize Harness and would require additional chaos experiment tools to carry out resiliency testing of their critical Windows based application(s)/infrastructure.

Conclusion

NetHavoc allows organizations and teams to conduct chaos experiments in conjunction with production-level traffic and extensive observability capabilities across applications, user sessions, logs, and infrastructure. Traditional methodologies of calculating resiliency scores with negligible observability insights and without appropriate user load falls way short of accurately depicting your mission critical application’s resiliency.

Figure 7: NetHavoc’s unique implementation combining chaos experiments with built-in load generation & observability

The above diagram illustrates how a unifying signal across various components (load, chaos experiments & observability) is fundamentally required to accurately drill down to the exact root cause behind issues being observed after conducting chaos experiments. Without this common signal, it becomes virtually impossible for traditional chaos engineering tools to gauge the extent and duration of KPI degradation without integrating multiple tools for application, user experience & log monitoring along with performance testing solutions.

Providing an extensive array of chaos experiment capabilities across the infrastructure and application layer becomes essential to accurately judge your IT ecosystem’s resiliency. Without this level of experiments spanning the entire spectrum, organizations cannot be prepared for outages seen in production as their resiliency preparedness remains limited.

Cavisson Systems’ NetHavoc elevates resiliency testing to resiliency engineering, assisting organizations and teams in realigning their focus on staying ahead of the competition instead of spending a massive amount of time figuring out the what, and why behind critical issues. Current tools are not adept at providing this level of insight and correlation, hence falling way short of actually ensuring that your mission critical applications are resilient enough to handle unplanned outages in production.  

Contact us today to view NetHavoc’s cutting edge capabilities and elevate your end user experience by building resistance to failure.

Â